Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed Allocation Explain API in batch mode #10348

Closed
wants to merge 3 commits into from

Conversation

shiv0408
Copy link
Member

@shiv0408 shiv0408 commented Oct 4, 2023

Description

This Pull request fixes the explainUnassignedShardAllocation method to use the correct batch allocator to find unassigned explain reason in case the batch mode is enabled.

Related Issues

This is part of improvement for #5098
Related to #8098

Dependency

This change is dependent on #8865, which needs to be merged before merging this PR

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • GitHub issue/PR created in OpenSearch documentation repo for the required public documentation changes (#[Issue/PR number])

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2023

Compatibility status:

Checks if related components are compatible with change a730ff5

Incompatible components

Skipped components

Compatible components

@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2023

Gradle Check (Jenkins) Run Completed with:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is originally added in #8746. I have added a new test below in this file. The diff will updated once that PR is merged and I rebase my changes on top of that.

.flatMap(List::stream).collect(Collectors.toList());
allShardRoutings.forEach(shard -> assertNull(testGatewayAllocator.getBatchId(shard, shard.primary())));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only following test is added in this PR

@shiv0408 shiv0408 requested a review from msfroh as a code owner October 17, 2023 08:50
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Shivansh Arora <[email protected]>
Copy link
Contributor

❌ Gradle check result for a730ff5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -113,6 +113,17 @@ protected GatewayAllocator() {
this.replicaShardAllocator = null;
}

// for tests
protected void setShardAllocators(PrimaryShardAllocator primaryShardAllocator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a constructor that is exposed for tests.

We expose protected methods for testing a method behaviour per say. For example we have exposed protected static void innerAllocatedUnassigned that takes allocators as argument.

The way you are exposing makes the room potential data loss or state loss for GatewayAllocator state, since GatewayAllocator is a singleton object for that reason itself in the memory. I would highly appreaciate to do away with this function as it can cause a potential state loss leak in GatewayAllocator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did tried writing this test without above code, but could not found any way to do it. If you have any other approach how we test the usecase without using this, happy to try that out.

assertEquals(0, testGatewayAllocator.getNumberOfStoreShardBatches());

// now calling allocation explain to ensure that batches are getting created
testAllocation.debugDecision(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also assert API response is getting returned


@Override
public AllocateUnassignedDecision explainUnassignedShardAllocation(ShardRouting unassignedShard, RoutingAllocation routingAllocation) {
setShardAllocators(primaryShardAllocator, replicaShardAllocator, primaryBatchShardAllocator, replicaBatchShardAllocator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reference to the above comment on setAllocators, trying add implementing a new GatewayAllocator like this in this test.
https://github.com/Gaurav614/OpenSearch/blob/collab-central/server/src/test/java/org/opensearch/cluster/routing/allocation/AllocationServiceTests.java#L263

I am also open for more suggestions /discussions/ challenges you might have implementing this

Comment on lines +79 to +80
private PrimaryShardAllocator primaryShardAllocator;
private ReplicaShardAllocator replicaShardAllocator;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, let us add back the final modifier and instead of a UT, we can use an Integration test for explainUnassigned in batch mode scenario.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jan 8, 2024
@ticheng-aws ticheng-aws added the bug Something isn't working label Jan 8, 2024
@ticheng-aws
Copy link
Contributor

Hi @shiv0408, Is this being worked upon? Feel free to reach out to maintainers for further reviews.

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jan 9, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Feb 11, 2024
@sohami sohami added Cluster Manager API Issues with external APIs labels Feb 14, 2024
@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Feb 17, 2024
@shiv0408
Copy link
Member Author

Discarding this PR as the core logic of explainUnassignedShardAllocation written in this PR is already added in #8746. Will add the test added in this PR to the above mentioned PR only.

@shiv0408 shiv0408 closed this Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Issues with external APIs bug Something isn't working Cluster Manager
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants